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Final  Report  AFOSR  FA9550-13-1-0225: 

Cloud-Based  Perception  and  Control  of  Sensor  Nets  and  Robot 

Swarms 


Geoffrey  Fox,  David  Crandall 
Indiana  University,  March  2016 

1.  Introduction 


This  project  investigated  the  use  of  Cloud  Computing  as  a  key  technology  for  Internet  of  Things  and  DDDAS 
applications.  We  developed  an  open  source  framework  called  IoTCloud[l]  to  connect  loT  devices  to  cloud  services 
and  used  it  to  investigated  algorithms  controlling  robots  from  the  cloud.  These  were  major  applications  needing  the 
additional  computer  power  offered  by  the  cloud  and  we  parallelized  applications  so  that  they  could  respond  quickly 
to  the  edge  devices.  This  project  produced  six  major  papers  [2-7]  and  a  report  [8].  It  also  contributed  to  the 
STREAM2015  workshop  and  its  final  report  [9].  We  summarize  highlights  here  of  published  work  and  give  some 

comments  for  follow  on  activities. 


A  i,  Batch  and  streaming  components  integrated  in  full  job 

loTCloud  consists  of:  a  set  of  distributed 
nodes  running  close  to  the  devices  to  gather 
and  do  initial  processing  (sometimes  called 
fog  layer)  of  the  data,  a  set  of  publish- 
subscribe  brokers  to  relay  the  information  to 
the  cloud  services,  and  a  distributed  stream 
processing  framework  (DSPF)  coupled  with 
batch  processing  engines  in  the  cloud  to 
process  the  data  and  return  (control) 
information  to  the  loT  devices.  Real  time 
applications  execute  data  analytics  at  the 
DSPF  layer  achieving  streaming  real-time 
processing.  Our  open-source  loTCloud 
platform  [5]  uses  Apache  Storm  [10]  as  the 
DSPF,  RabbitMQ  [11]  or  Kafka  [12]  as  the 
message  broker  and  an  OpenStack  academic 
cloud  [13]  (or  bare-metal  cluster)  as  the 
platform.  To  scale  the  applications  with 
number  of  devices  we  need  distributed 
coordination  among  parallel  tasks  and 
discovery  of  devices;  both  were  achieved 
with  a  ZooKeeper  [14]  based  coordination 
and  discovery  service. 


In  general  a  real  time  application  running  in 
a  DSPF  can  be  modeled  as  a  directed  graph 
consisting  of  streams  and  stream  processing 
tasks.  Stream  tasks  are  at  the  nodes  of  the 
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Figure  1  loTCloud  Architecture 


graph  and  streams  are  the  edges  connecting  the  nodes.  A  stream  is  an  unbounded  sequence  of  events  flowing 
through  the  edges  of  the  graph  and  each  such  event  consists  of  data  represented  in  some  format.  The  processing 
tasks  at  the  nodes  consume  input  streams  and  produce  output  streams.  A  distributed  stream  processing  framework 
provides  the  necessary  API  and  infrastructure  to  develop  and  execute  such  applications  in  a  cluster  of  computation 
nodes.  The  main  tasks  of  a  DSPF  include  1)  Providing  an  API  to  develop  streaming  applications,  2)  Distributing  the 
stream  tasks  in  the  cluster  and  managing  the  life  cycle  of  tasks,  3)  Creating  the  communication  fabric,  4) 
Monitoring  and  gathering  statistics  about  the  applications,  and  5)  Providing  mechanisms  to  recover  from  faults. 
These  frameworks  generally  allow  the  same  task  to  be  executed  in  parallel  and  provide  rich  communication 
channels  among  the  tasks.  Some  DSPF’s  allow  the  applications  to  define  the  graph  explicitly  and  some  create  the 
graph  dynamically  at  run  time  from  implicit  information. 

For  most  streaming  applications,  latency  is  of  utmost  importance  and  the  system  should  be  able  to  recover  fast 
enough  from  faults  for  normal  processing  to  continue  with  minimal  effect  to  the  applications.  A  detailed  study  of 
recovery  methods  possible  for  streaming  applications  is  available  in  [15].  In  our  work,  we  term  real  time 
applications  that  produce  correct  answers  but  violate  timing  requirements  as  having  performance  faults.  Our 
research  addresses  (with  the  same  mechanisms)  both  explicit  hardware/software  and  performance  faults. 

We  are  exploring  cloud  controlled  real  time  loT  applications  in  two  distinct 
dimensions.  In  one  dimension  there  are  computationally  intensive  algorithms  for 
processing  device  data  that  can  benefit  from  cloud  based  processing  for  real  time 
response.  These  methods  are  powerful  but  impossible  to  run  near  the  devices 
due  to  high  computational  and  specialized  hardware  requirements.  In  the  other 
dimension  there  are  applications  that  have  to  be  scaled  to  support  vast  numbers 
of  devices  and  are  inherently  suitable  for  central  data  processing.  We  have 
developed  a  parallel  particle  filtering  based  SLAM  [16,  17]  algorithm  [6]  and 
deep  learning  based  drone  [18]  control  algorithm  [4],  which  both  fit  into  the  first 
category.  As  an  application  of  the  second  category,  we  have  developed  a  robot 
swarm  algorithm  [3]  for  n-body  collision  avoidance  [19-21]  that  can  scale  for  a 
large  number  of  robots.  In  all  three  cases,  we  have  working  versions  with  good 
performance  characteristics  and  papers  published  or  under  consideration.  The 
parallel  SLAM  and  n-body  collision  avoidance  algorithms  use  Turtlebot  [22]  as 
the  robot  and  ROS  [23]  as  the  SDK  for  connecting  to  the  robot.  The  overall 
parallel  SLAM  application  is  shown  in  Figure  2. 

Through  our  work  in  developing  these  applications,  we  have  identified 
shortcomings  in  the  current  technologies,  based  on  current  and  future 
requirements.  These  imply  loTCloud  extensions,  termed  loTCloud-i-i-,  that  can 
give  scaling  with  performance  guarantees  and  represent  possible  future  research  illustrated  by  our  recent  extensions 
[7]  to  Apache  Storm  to  improve  its  communication  performance.  Our  future  work  includes  more  extensive 
performance  testing  and  additional  applications. 

2.  Streaming  Application  DDDAS  Challenges  for  loT  Cloud  Controller 

We  present  five  categories  of  streaming  DDDAS  applications  based  on  challenges  they  present  to  the  backend 
Cloud  control  system. 
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1) 

2) 

3) 

4) 

5) 


6) 


Set  of  independent  events  where  precise  time  sequencing  unimportant.  Example:  independent  search  requests 
or  tweets  from  users 

Time  series  of  connected  small  events  where  time  ordering  is  important.  Example:  streaming  audio  or  video; 
robot  monitoring 

Set  of  independent  large  events  where  each  event  needs  parallel  processing  with  time  sequencing  not  critical 
Example:  processing  images  from  telescopes  or  light  sources  with  material  or  biological  sciences. 

Set  of  connected  large  events  where  each  event  needs  parallel  processing  with  time  sequencing  critical. 
Example:  processing  high  resolution  monitoring  (including  video)  information  from  robots  (self-driving  cars) 
with  real  time  response  needed 

Stream  of  connected  small  or  large  events  that  need  to  be  integrated  in  a  complex  way.  Example:  tweets  or 
other  online  data  where  we  are  using  them  to  update  old  and  find  new  clusters  rather  just  classifying  tweets 
based  on  previous  clusters  as  in  1),  i.e.  where  we  update  model  as  well  as  using  it  to  classify  event. 


7)  Figure  2a 


Figure  3b 


Figure  3a  Fluctuations  in  Time  of  loTCIoud  using  RabbitMQ  and  Kafka  with  Minimal  Processing  in  Storm 
Figure  3b  Fluctuation  in  Time  of  loTCIoud  with  processing  Kinect  data  from  TurtleBot  with  RabbitMQ 


ind«M 


Figure  4  a)  Performance  of  Cloud-based  Deep  Learning  and  4  b)  Typical  region  split  and  recognition  of  multiple 
objects  in  a  single  image. 

These  5  categories  can  be  considered  for  single  or  multiple  heterogeneous  streams.  Our  initial  work  has  identified 
difficulties  in  meeting  real  time  constraints  in  cloud  controlled  loT  due  to  either  the  intrinsic  time  needed  to  process 
events  or  due  to  fluctuations  in  processing  time  caused  by  virtualization,  multi-stream  interference  and  messaging 
fluctuations.  Figure  3a  shows  the  fluctuations  we  observed  with  RabbitMQ  and  Kafka  with  minimal  processing  in 
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Apache  Storm  and  Figure  3b  show  fluctuations  in  processing  3d  point  cloud  Kinect  data  in  Storm  from  a  Turtlebot 
with  RabbitMQ.  Large  computational  complexity  in  event  processing  is  naturally  addressed  by  using  parallelism  in 
the  Storm  bolts,  but  that  also  can  lead  to  further  sensitivity  to  fluctuations.  Currently  loTCloud  can  handle  1) 
automatically  and  3)  with  user  designed  parallelism.  The  other  cases  require  careful  tuning  on  a  case  by  case  basis 
and  still  can  see  unexpected  large  fluctuations  in  processing  time  that  currently  we  do  not  address  except  by  over¬ 
provisioning. 

Category  4)  is  illustrated  by  our  work  on  deep  learning  for  drones.  The  idea  is  that  state  of  the  art  deep  learning- 
based  object  detectors  can  recognize  among  hundreds  of  object  classes  and  this  capability  would  be  very  useful  for 
mobile  devices,  including  robots.  However  as  a  model  for  a  single  object  can  have  billions  of  parameters,  the 
compute  requirements  are  enormous  with  classification  requiring  ~20  sec/image  on  a  high  end  CPU,  and  ~2 
sec/image  on  a  high-end  GPU.  Our  results  using  Regions  with  Convolutional  Neural  Networks  CNNs  (R-CNNs) 
trained  on  ImageNet  are  shown  in  figure  4.  Note  for  this  problem  latency  is  unimportant  as  the  cloud  processing 
time  is  so  long. 

A  future  loTCloud-i-i-  will  enhance  loTCloud  to  allow  real-time  guarantees  and  fault  tolerance  in  both  execution 
and  performance.  We  will  achieve  this  autonomic  behavior  by  allowing  dynamic  replication  and  elastic  parallelism 
in  a  self-monitored  environment.  This  work  will  be  delivered  as  an  enhancement  to  Storm  extending  the  work  in 
[7]. 

4.  Related  Work 

Industry  is  realizing  the  need  of  data  analytics  driven  approaches  to  support  efficient  operations  at  all  levels  to 
reduce  the  costs  and  increase  innovation.  The  machines  are  getting  intelligent  with  software  controls  and 
communication  to  outside  services.  Industry  can  benefit  immensely  from  real  time  central  management  to  deploy, 
manage,  upgrade,  and  decommission  these  intelligent  machines.  Concepts  like  Brilliant  machines  [24]  by  GE 
Software  are  pushing  the  industry  towards  such  connected  and  intelligent  infrastructure.  A  Brilliant  machine 
connected  to  the  Industrial  Internet  of  Things  can  run  software  that  will  make  the  machines  react  to  changes  in  data 
and  its  environment  both  in  operation  and  configuration  and  can  communicate  with  other  machines.  Software 
Defined  Machines  (SDM)  is  a  software  environment  to  program  such  machines  with  a  generic  API  hiding  the 
underlying  details  such  as  hardware  details.  A  SDM  for  a  brilliant  machine  can  run  close  to  the  machine  or  can  be 
hosted  in  the  cloud.  Having  generic  distributed  open  platforms  such  as  loTCloud  to  execute  both  data  analytics  and 
SDMs  in  cloud  will  be  beneficial  for  such  applications. 

Distributed  stream  processing  provides  frameworks  to  deploy,  execute  and  manage  event  based  applications  at 
large  scale.  Many  years  of  research  [8]  have  produced  software  frameworks  capable  of  executing  distributed 
computations  on  top  of  event  streams.  Examples  of  such  early  event  stream  processing  frameworks  include  Aurora 
[25],  Borealis  [26],  Streamit  [27]  and  SPADE[28].  With  the  emergence  of  Internet  scale  applications  in  recent 
years,  new  distributed  stream  processing  systems  like  Apache  S4  [29],  Apache  Storm  [10],  Apache  Samza  [30], 
Spark  Streaming  [31]  and  commercial  solutions  including  Google  Millwheel  [32]  and  Amazon  Kinesis  [33]  have 
been  developed. 

Apache  Storm  applications  are  developed  in  the  model  of  the  graphical  dataflow  we  introduced  earlier.  A  Storm 
application  consists  of  Spouts,  Bolts,  and  Streams.  Spouts  and  Bolts  are  the  nodes  in  the  graph  connected  by 
streams  and  a  single  such  application  is  called  a  Topology.  Storm  uses  its  own  servers  to  manage  and  distributes  the 
tasks  among  the  cluster  nodes.  The  communication  fabric  is  built  on  top  of  TCP  using  the  Netty  library.  Storm 
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provides  at  least  once  processing  guarantees  at  its  core.  Apache  Samza  is  another  open  source  stream-processing 
framework  developed  on  top  of  Kafka  message  broker  and  Apache  Yarn.  Samza  applications  are  similar  to  Storm 
applications  in  the  graph  structure,  and  differences  between  Storm  and  Samza  include  technical  details  in  how  they 
distribute  the  tasks  and  how  they  manage  the  communications.  Because  the  Samza  messaging  layer  is  backed  by  a 
file  based  message  broker  Kafka,  its  latency  is  expected  to  be  higher  compared  to  other  processing  engines. 

Apache  Spark  streaming  extends  the  Spark  batch  processing  system.  Spark  is  a  batch  processing  system  targeting 
iterative  algorithms  and  interactive  analytics  problems  on  top  of  large  data  sets.  In  the  streaming  case.  Spark  reads 
input  from  a  stream  source  like  a  message  queue.  It  uses  small  batches  of  incoming  data  as  input  to  the  running 
jobs,  creating  the  illusion  of  continuous  processing.  Such  batching  of  the  inputs  is  not  very  attractive  for  real  time 
applications.  S4  is  another  fully  distributed  real  time  stream  processing  framework.  The  processing  model  is 
inspired  by  map-reduce  and  uses  a  key-value  based  programming  model.  S4  creates  a  dynamic  network  of 
processing  elements  (PEs)  and  these  are  arranged  in  a  DAG  at  runtime.  One  of  the  biggest  challenges  in  the  PE 
architecture  is  that  key  attributes  with  very  large  domains  can  create  large  numbers  of  PEs  in  the  system  at  any 
given  time. 

A  comprehensive  list  of  optimizations  possible  to  reduce  the  latency  of  the  stream  processing  applications  are 
mentioned  in  [34].  These  optimizations  include  features  like  operator  reordering,  load  balancing,  fusion,  fission, 
etc.  All  the  operations  mentioned  are  targeted  towards  optimizing  the  average  performance  metrics  of  the  system. 
Eor  real  time  applications  individual  tuple  latency  is  also  very  important. 

There  are  many  open  source  message  brokers  available  that  can  act  as  gateways  to  the  stream  processing  platforms. 
Such  brokers  includes  ActiveMQ  [35],  RabbitMQ  [11],  Kafka  [12],  Kestrel,  and  HornetMQ.  ActiveMQ, 
RabbitMQ,  Kestrel  and  HornetMQ,  are  all  in  memory  message  brokers  with  optional  persistent  storages.  On  the 
other  hand,  Kafka  is  a  store  first  broker  backed  by  a  message  log.  Compared  to  other  message  brokers  Kafka  has 
better  parallel  consumption  semantics,  scalability  and  fault  tolerance  due  to  its  topic  partition  and  replication  across 
the  cluster.  Our  measurements  [2]  showed  that  RabbitMQ  illustrated  in  fig.  2  has  comparable  or  superior 
performance  compared  to  other  brokers  and  Kafka  has  large  fluctuations  in  latency.  We  should  revisit  this  question 
when  the  performance  enhancements  of  loTCloud-i-i-  are  implemented. 

Implementing  real  time  applications  with  critical  time  requirements  in  the  vanilla  Java  virtual  machine  is  a 
challenge  itself  due  to  garbage  collection  and  other  unpredictable  factors.  There  has  been  efforts  to  improve  the 
Java  runtime  and  JDK  to  fit  these  requirements  [36-38].  Most  of  these  studies  are  related  to  real  time  requirements 
in  embedded  systems  that  control  the  devices.  In  our  platform  the  actual  software  controlling  the  loT  devices  will 
be  running  near  the  device  and  the  cloud  processing  will  enhance  this  processing  for  stages  where  some  latency 
(-few  100ms)  can  be  tolerated. 

Robot  Operating  System  (ROS)  is  an  open  source  platform  that  offers  a  set  of  software  libraries  to  build  robotics 
applications.  Popular  off  the  shelf  robots  have  ROS  applications  already  written  and  these  applications  combined 
with  the  available  wide  range  of  tools  such  as  visualization  tools  and  simulators  create  a  powerful  environment  for 
researchers.  In  some  of  our  cloud  applicationsm  we  use  ROS  as  the  first  layer  to  connect  to  the  robot,  collect  data 
and  control  it.  We  transform  the  ROS  data  structures  to  data  structures  required  by  cloud  applications  at  the 
gateways. 

Open  standards  like  MQTT  [39]  and  MTConnect  [40]  are  being  developed  to  bridge  the  gap  between  the 
application  data  requirements  and  the  device  data.  loTCloud  support  the  MQTT  transport  and  can  transfer  data 
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between  gateways  and  cloud  using  MQTT.  If  the  devices  send  data  with  the  MQTT  protocol,  they  can  be  sent 
without  transformation  at  the  gateways  directly  to  the  cloud. 


5.  Proposed  Research  Plan  for  Robust  Open  Source  Cloud  loT  Controller 
with  Real  Time  QoS 


Our  research  has  identified  the  need  to 
achieve  real-time  QoS  in  spite  of 
fluctuations  in  computation  time  and  we 
have  designed  loTCloud-i-i-  to  address 
this,  although  we  did  not  have  resources 
to  address  this  outside  the  recently 
published  extensions  to  Storm  [7].  The 
architecture  of  the  new  loTCloud-i-i- 
platform  is  shown  in  Figure  5.  In  this 
architecture,  we  propose  to  dynamically 
replicate  the  streaming  computation  tasks 
within  cloud  clusters  to  achieve  good 
performance  in  at  least  one  replica.  This 
replication  will  not  be  universal  but 
rather  done  only  when  achieving  QoS 
demands  it  as  for  example  when 
monitoring  shows  that  initial  task  is 
delayed.  This  as-needed  replication  will 
drastically  reduce  overhead  from 
replication  in  many  cases.  We  will 
dynamically  identify  the  streaming  tasks 
that  require  replication  and  replicate  at 
the  task  level  rather  than  at  the  streaming 
application  level.  This  dynamic 
replication  of  streaming  tasks  will  be 
implemented  for  Apache  Storm 
described  above.  Apache  Storm  consists  of  two  types  of  servers  called  Nimbus  and  Supervisor.  Nimbus  manages 
the  streaming  applications  running  in  the  cluster.  Each  Supervisor  consists  of  a  fixed  number  of  workers  capable  of 
executing  the  stream  tasks  belonging  to  a  job.  To  dynamically  increase  the  Storm  servers,  we  will  use  a  resource 
manager  such  as  Apache  Yarn.  Apache  Storm  is  already  ported  to  run  on  top  of  Apache  Yarn  and  we  will  extend 
this  framework  to  support  elastic  cluster  resizing. 

A  resource  management  framework  such  as  Yam  only  works  with  the  allocated  computation  resources.  We  will  use 
the  laaS  layer  to  dynamically  scale  computation  nodes  in  the  cluster.  We  have  extensive  expertise  at  the 
infrastructure  level  where  we  can  instantiate  systems  on  demand  that  can  then  support  the  dynamic  scaling  of  the 
system.  We  will  explore  the  Google  Compute  engine  for  the  infrastructure  level.  Streaming  computation  nodes  will 
be  managed  by  the  resource  management  layer  and  this  will  be  controlled  by  a  separate  component.  We  can  either 
use  the  messaging  system  or  a  distributed  key  value  store  to  replicate  the  state  as  is  done  in  MillWheel  [32].  The 
fluctuations  in  time  at  the  broker  are  from  fig.  3  much  less  (than  those  in  processing  stage)  in  RabbitMQ  but 
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important  in  Kafka.  We  will  scale  the  brokers  at  runtime  to  minimize  such  effects  to  the  system  by  monitoring 
performance  of  brokers.  Then  a  controller  will  directly  use  the  laaS  infrastructure  to  scale  the  brokers  as  needed  by 
increasing  the  number  of  assigned  VMs. 

To  scale  an  application  that  receives  input  from  multiple  sources  as  a  single  stream  and  needs  to  differentiate  each 
source,  the  larger  stream  must  be  partitioned  into  sub  streams  according  to  the  source.  This  can  be  done  with 
current  processing  frameworks  but  when  parallel  processing  and  state  tracking  is  needed,  the  user  code  becomes 
complex.  Also  for  parallel  processing  the  scheduling  is  not  adequate  because  each  task  will  get  a  sub  task  for  every 
sub  stream.  We  will  solve  this  by  introducing  new  data  abstractions  and  scheduling  at  the  sub  stream  level  and  task 
level.  The  loTCloud  project  is  largely  built  on  top  of  Apache  Open  Source  projects.  We  have  extensive  experience 
in  working  with  Apache  projects  (as  users,  committers  and  ASF  members)  and  will  contribute  the  results  of  this 
research  back  to  the  open  source  community. 
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